SGI Freeware 1999 August

home *** CD-ROM | disk | FTP | other *** search

/ SGI Freeware 1999 August / SGI Freeware 1999 August.iso / dist / fw_xemacs.idb / usr / freeware / lib / xemacs-20.4 / info / internals.info-4.z / internals.info-4

Wrap

GNU Info File | 1998-05-21 | 47.0 KB | 976 lines

This is Info file ../../info/internals.info, produced by Makeinfo version 1.68 from the input file internals.texi. Copyright (C) 1992 - 1996 Ben Wing. Copyright (C) 1996, 1997 Sun Microsystems. Copyright (C) 1994, 1995 Free Software Foundation. Copyright (C) 1994, 1995 Board of Trustees, University of Illinois. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled "GNU General Public License" is included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the section entitled "GNU General Public License" may be included in a translation approved by the Free Software Foundation instead of in the original English. File: internals.info, Node: Modules for Internationalization, Prev: Modules for Interfacing with X Windows, Up: A Summary of the Various XEmacs Modules Modules for Internationalization ================================ size name ------- --------------------- 42836 mule-canna.c 16737 mule-ccl.c 41080 mule-charset.c 30176 mule-charset.h 146844 mule-coding.c 16588 mule-coding.h 6996 mule-mcpath.c 2899 mule-mcpath.h 57158 mule-wnnfns.c 3351 mule.c These files implement the MULE (Asian-language) support. Note that MULE actually provides a general interface for all sorts of languages, not just Asian languages (although they are generally the most complicated to support). This code is still in beta. `mule-charset.*' and `mule-coding.*' provide the heart of the XEmacs MULE support. `mule-charset.*' implements the "charset" Lisp object type, which encapsulates a character set (an ordered one- or two-dimensional set of characters, such as US ASCII or JISX0208 Japanese Kanji). `mule-coding.*' implements the "coding-system" Lisp object type, which encapsulates a method of converting between different encodings. An encoding is a representation of a stream of characters, possibly from multiple character sets, using a stream of bytes or words, and defines (e.g.) which escape sequences are used to specify particular character sets, how the indices for a character are converted into bytes (sometimes this involves setting the high bit; sometimes complicated rearranging of the values takes place, as in the Shift-JIS encoding), etc. `mule-ccl.c' provides the CCL (Code Conversion Language) interpreter. CCL is similar in spirit to Lisp byte code and is used to implement converters for custom encodings. `mule-canna.c' and `mule-wnnfns.c' implement interfaces to external programs used to implement the Canna and WNN input methods, respectively. This is currently in beta. `mule-mcpath.c' provides some functions to allow for pathnames containing extended characters. This code is fragmentary, obsolete, and completely non-working. Instead, PATHNAME-CODING-SYSTEM is used to specify conversions of names of files and directories. The standard C I/O functions like `open()' are wrapped so that conversion occurs automatically. `mule.c' provides a few miscellaneous things that should probably be elsewhere. 9400 intl.c This provides some miscellaneous internationalization code for implementing message translation and interfacing to the Ximp input method. None of this code is currently working. 1764 iso-wide.h This contains leftover code from an earlier implementation of Asian-language support, and is not currently used. File: internals.info, Node: Allocation of Objects in XEmacs Lisp, Next: Events and the Event Loop, Prev: A Summary of the Various XEmacs Modules, Up: Top Allocation of Objects in XEmacs Lisp ************************************ * Menu: * Introduction to Allocation:: * Garbage Collection:: * GCPROing:: * Integers and Characters:: * Allocation from Frob Blocks:: * lrecords:: * Low-level allocation:: * Pure Space:: * Cons:: * Vector:: * Bit Vector:: * Symbol:: * Marker:: * String:: * Bytecode:: File: internals.info, Node: Introduction to Allocation, Next: Garbage Collection, Up: Allocation of Objects in XEmacs Lisp Introduction to Allocation ========================== Emacs Lisp, like all Lisps, has garbage collection. This means that the programmer never has to explicitly free (destroy) an object; it happens automatically when the object becomes inaccessible. Most experts agree that garbage collection is a necessity in a modern, high-level language. Its omission from C stems from the fact that C was originally designed to be a nice abstract layer on top of assembly language, for writing kernels and basic system utilities rather than large applications. Lisp objects can be created by any of a number of Lisp primitives. Most object types have one or a small number of basic primitives for creating objects. For conses, the basic primitive is `cons'; for vectors, the primitives are `make-vector' and `vector'; for symbols, the primitives are `make-symbol' and `intern'; etc. Some Lisp objects, especially those that are primarily used internally, have no corresponding Lisp primitives. Every Lisp object, though, has at least one C primitive for creating it. Recall from section (VII) that a Lisp object, as stored in a 32-bit or 64-bit word, has a mark bit, a few tag bits, and a "value" that occupies the remainder of the bits. We can separate the different Lisp object types into four broad categories: * (a) Those for whom the value directly represents the contents of the Lisp object. Only two types are in this category: integers and characters. No special allocation or garbage collection is necessary for such objects. Lisp objects of these types do not need to be `GCPRO'ed. In the remaining three categories, the value is a pointer to a structure. * (b) Those for whom the tag directly specifies the type. Recall that there are only three tag bits; this means that at most five types can be specified this way. The most commonly-used types are stored in this format; this includes conses, strings, vectors, and sometimes symbols. With the exception of vectors, objects in this category are allocated in "frob blocks", i.e. large blocks of memory that are subdivided into individual objects. This saves a lot on malloc overhead, since there are typically quite a lot of these objects around, and the objects are small. (A cons, for example, occupies 8 bytes on 32-bit machines - 4 bytes for each of the two objects it contains.) Vectors are individually `malloc()'ed since they are of variable size. (It would be possible, and desirable, to allocate vectors of certain small sizes out of frob blocks, but it isn't currently done.) Strings are handled specially: Each string is allocated in two parts, a fixed size structure containing a length and a data pointer, and the actual data of the string. The former structure is allocated in frob blocks as usual, and the latter data is stored in "string chars blocks" and is relocated during garbage collection to eliminate holes. In the remaining two categories, the type is stored in the object itself. The tag for all such objects is the generic "lrecord" (Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines) of the object's structure are a pointer to a structure that describes the object's type, which includes method pointers and a pointer to a string naming the type. Note that it's possible to save some space by using a one- or two-byte tag, rather than a four- or eight-byte pointer to store the type, but it's not clear it's worth making the change. * (c) Those lrecords that are allocated in frob blocks (see above). This includes the objects that are most common and relatively small, and includes floats, bytecodes, symbols (when not in category (b)), extents, events, and markers. With the cleanup of frob blocks done in 19.12, it's not terribly hard to add more objects to this category, but it's a bit trickier than adding an object type to type (d) (esp. if the object needs a finalization method), and is not likely to save much space unless the object is small and there are many of them. (In fact, if there are very few of them, it might actually waste space.) * (d) Those lrecords that are individually `malloc()'ed. These are called "lcrecords". All other types are in this category. Adding a new type to this category is comparatively easy, and all types added since 19.8 (when the current allocation scheme was devised, by Richard Mlynarik), with the exception of the character type, have been in this category. Note that bit vectors are a bit of a special case. They are simple lrecords as in category (c), but are individually `malloc()'ed like vectors. You can basically view them as exactly like vectors except that their type is stored in lrecord fashion rather than in directly-tagged fashion. Note that FSF Emacs redesigned their object system in 19.29 to follow a similar scheme. However, given RMS's expressed dislike for data abstraction, the FSF scheme is not nearly as clean or as easy to extend. (FSF calls items of type (c) `Lisp_Misc' and items of type (d) `Lisp_Vectorlike', with separate tags for each, although `Lisp_Vectorlike' is also used for vectors.) File: internals.info, Node: Garbage Collection, Next: GCPROing, Prev: Introduction to Allocation, Up: Allocation of Objects in XEmacs Lisp Garbage Collection ================== Garbage collection is simple in theory but tricky to implement. Emacs Lisp uses the oldest garbage collection method, called "mark and sweep". Garbage collection begins by starting with all accessible locations (i.e. all variables and other slots where Lisp objects might occur) and recursively traversing all objects accessible from those slots, marking each one that is found. We then go through all of memory and free each object that is not marked, and unmarking each object that is marked. Note that "all of memory" means all currently allocated objects. Traversing all these objects means traversing all frob blocks, all vectors (which are chained in one big list), and all lcrecords (which are likewise chained). Note that, when an object is marked, the mark has to occur inside of the object's structure, rather than in the 32-bit `Lisp_Object' holding the object's pointer; i.e. you can't just set the pointer's mark bit. This is because there may be many pointers to the same object. This means that the method of marking an object can differ depending on the type. The different marking methods are approximately as follows: 1. For conses, the mark bit of the car is set. 2. For strings, the mark bit of the string's plist is set. 3. For symbols when not lrecords, the mark bit of the symbol's plist is set. 4. For vectors, the length is negated after adding 1. 5. For lrecords, the pointer to the structure describing the type is changed (see below). 6. Integers and characters do not need to be marked, since no allocation occurs for them. The details of this are in the `mark_object()' function. Note that any code that operates during garbage collection has to be especially careful because of the fact that some objects may be marked and as such may not look like they normally do. In particular: Some object pointers may have their mark bit set. This will make `FOOBARP()' predicates fail. Use `GC_FOOBARP()' to deal with this. * Even if you clear the mark bit, `FOOBARP()' will still fail for lrecords because the implementation pointer has been changed (see below). `GC_FOOBARP()' will correctly deal with this. * Vectors have their size field munged, so anything that looks at this field will fail. * Note that `XFOOBAR()' macros *will* work correctly on object pointers with their mark bit set, because the logical shift operations that remove the tag also remove the mark bit. Finally, note that garbage collection can be invoked explicitly by calling `garbage-collect' but is also called automatically by `eval', once a certain amount of memory has been allocated since the last garbage collection (according to `gc-cons-threshold'). File: internals.info, Node: GCPROing, Next: Integers and Characters, Prev: Garbage Collection, Up: Allocation of Objects in XEmacs Lisp `GCPRO'ing ========== `GCPRO'ing is one of the ugliest and trickiest parts of Emacs internals. The basic idea is that whenever garbage collection occurs, all in-use objects must be reachable somehow or other from one of the roots of accessibility. The roots of accessibility are: 1. All objects that have been `staticpro()'d. This is used for any global C variables that hold Lisp objects. A call to `staticpro()' happens implicitly as a result of any symbols declared with `defsymbol()' and any variables declared with `DEFVAR_FOO()'. You need to explicitly call `staticpro()' (in the `vars_of_foo()' method of a module) for other global C variables holding Lisp objects. (This typically includes internal lists and such things.) Note that `obarray' is one of the `staticpro()'d things. Therefore, all functions and variables get marked through this. 2. Any shadowed bindings that are sitting on the specpdl stack. 3. Any objects sitting in currently active (Lisp) stack frames, catches, and condition cases. 4. A couple of special-case places where active objects are located. 5. Anything currently marked with `GCPRO'. Marking with `GCPRO' is necessary because some C functions (quite a lot, in fact), allocate objects during their operation. Quite frequently, there will be no other pointer to the object while the function is running, and if a garbage collection occurs and the object needs to be referenced again, bad things will happen. The solution is to mark those objects with `GCPRO'. Unfortunately this is easy to forget, and there is basically no way around this problem. Here are some rules, though: 1. For every `GCPRON', there have to be declarations of `struct gcpro gcpro1, gcpro2', etc. 2. You *must* `UNGCPRO' anything that's `GCPRO'ed, and you *must not* `UNGCPRO' if you haven't `GCPRO'ed. Getting either of these wrong will lead to crashes, often in completely random places unrelated to where the problem lies. 3. The way this actually works is that all currently active `GCPRO's are chained through the `struct gcpro' local variables, with the variable `gcprolist' pointing to the head of the list and the nth local `gcpro' variable pointing to the first `gcpro' variable in the next enclosing stack frame. Each `GCPRO'ed thing is an lvalue, and the `struct gcpro' local variable contains a pointer to this lvalue. This is why things will mess up badly if you don't pair up the `GCPRO's and `UNGCPRO's - you will end up with `gcprolist's containing pointers to `struct gcpro's or local `Lisp_Object' variables in no-longer-active stack frames. 4. It is actually possible for a single `struct gcpro' to protect a contiguous array of any number of values, rather than just a single lvalue. To effect this, call `GCPRON' as usual on the first object in the array and then set `gcpron.nvars'. 5. *Strings are relocated.* What this means in practice is that the pointer obtained using `XSTRING_DATA()' is liable to change at any time, and you should never keep it around past any function call, or pass it as an argument to any function that might cause a garbage collection. This is why a number of functions accept either a "non-relocatable" `char *' pointer or a relocatable Lisp string, and only access the Lisp string's data at the very last minute. In some cases, you may end up having to `alloca()' some space and copy the string's data into it. 6. By convention, if you have to nest `GCPRO''s, use `NGCPRON' (along with `struct gcpro ngcpro1, ngcpro2', etc.), `NNGCPRON', etc. This avoids compiler warnings about shadowed locals. 7. It is *always* better to err on the side of extra `GCPRO's rather than too few. The extra cycles spent on this are almost never going to make a whit of difference in the speed of anything. 8. The general rule to follow is that caller, not callee, `GCPRO's. That is, you should not have to explicitly `GCPRO' any Lisp objects that are passed in as parameters, but if you create any Lisp objects (remember, this happens in all sorts of circumstances, e.g. with `Fcons()', etc.), you are responsible for `GCPRO'ing the objects unless you are *absolutely sure* that there's no possibility that a garbage-collection can occur while you need to use the object. Even then, consider `GCPRO'ing. 9. A garbage collection can occur whenever anything calls `Feval', or whenever a QUIT can occur where execution can continue past this. (Remember, this is almost anywhere.) 10. If you have the *least smidgeon of doubt* about whether you need to `GCPRO', you should `GCPRO'. 11. Beware of `GCPRO'ing something that is uninitialized. If you have any shade of doubt about this, initialize all your variables to `Qnil'. 12. Be careful of traps, like calling `Fcons()' in the argument to another function. By the "caller protects" law, you should be `GCPRO'ing the newly-created cons, but you aren't. A certain number of functions that are commonly called on freshly created stuff (e.g. `nconc2()', `Fsignal()'), break the "caller protects" law and go ahead and `GCPRO' their arguments so as to simplify things, but make sure and check if it's OK whenever doing something like this. 13. Once again, remember to `GCPRO'! Bugs resulting from insufficient `GCPRO'ing are intermittent and extremely difficult to track down, often showing up in crashes inside of `garbage-collect' or in weirdly corrupted objects or even in incorrect values in a totally different section of code. Given the extremely error-prone nature of the `GCPRO' scheme, and the difficulties in tracking down, it should be considered a deficiency in the XEmacs code. A solution to this problem would involve implementing so-called "conservative" garbage collection for the C stack. That involves looking through all of stack memory and treating anything that looks like a reference to an object as a reference. This will result in a few objects not getting collected when they should, but it obviates the need for `GCPRO'ing, and allows garbage collection to happen at any point at all, such as during object allocation. File: internals.info, Node: Integers and Characters, Next: Allocation from Frob Blocks, Prev: GCPROing, Up: Allocation of Objects in XEmacs Lisp Integers and Characters ======================= Integer and character Lisp objects are created from integers using the macros `XSETINT()' and `XSETCHAR()' or the equivalent functions `make_int()' and `make_char()'. (These are actually macros on most systems.) These functions basically just do some moving of bits around, since the integral value of the object is stored directly in the `Lisp_Object'. `XSETINT()' and the like will truncate values given to them that are too big; i.e. you won't get the value you expected but the tag bits will at least be correct. File: internals.info, Node: Allocation from Frob Blocks, Next: lrecords, Prev: Integers and Characters, Up: Allocation of Objects in XEmacs Lisp Allocation from Frob Blocks =========================== The uninitialized memory required by a `Lisp_Object' of a particular type is allocated using `ALLOCATE_FIXED_TYPE()'. This only occurs inside of the lowest-level object-creating functions in `alloc.c': `Fcons()', `make_float()', `Fmake_byte_code()', `Fmake_symbol()', `allocate_extent()', `allocate_event()', `Fmake_marker()', and `make_uninit_string()'. The idea is that, for each type, there are a number of frob blocks (each 2K in size); each frob block is divided up into object-sized chunks. Each frob block will have some of these chunks that are currently assigned to objects, and perhaps some that are free. (If a frob block has nothing but free chunks, it is freed at the end of the garbage collection cycle.) The free chunks are stored in a free list, which is chained by storing a pointer in the first four bytes of the chunk. (Except for the free chunks at the end of the last frob block, which are handled using an index which points past the end of the last-allocated chunk in the last frob block.) `ALLOCATE_FIXED_TYPE()' first tries to retrieve a chunk from the free list; if that fails, it calls `ALLOCATE_FIXED_TYPE_FROM_BLOCK()', which looks at the end of the last frob block for space, and creates a new frob block if there is none. (There are actually two versions of these macros, one of which is more defensive but less efficient and is used for error-checking.) File: internals.info, Node: lrecords, Next: Low-level allocation, Prev: Allocation from Frob Blocks, Up: Allocation of Objects in XEmacs Lisp lrecords ======== [see `lrecord.h'] All lrecords have at the beginning of their structure a `struct lrecord_header'. This just contains a pointer to a `struct lrecord_implementation', which is a structure containing method pointers and such. There is one of these for each type, and it is a global, constant, statically-declared structure that is declared in the `DEFINE_LRECORD_IMPLEMENTATION()' macro. (This macro actually declares an array of two `struct lrecord_implementation' structures. The first one contains all the standard method pointers, and is used in all normal circumstances. During garbage collection, however, the lrecord is "marked" by bumping its implementation pointer by one, so that it points to the second structure in the array. This structure contains a special indication in it that it's a "marked-object" structure: the finalize method is the special function `this_marks_a_marked_record()', and all other methods are null pointers. At the end of garbage collection, all lrecords will either be reclaimed or unmarked by decrementing their implementation pointers, so this second structure pointer will never remain past garbage collection. Simple lrecords (of type (c) above) just have a `struct lrecord_header' at their beginning. lcrecords, however, actually have a `struct lcrecord_header'. This, in turn, has a `struct lrecord_header' at its beginning, so sanity is preserved; but it also has a pointer used to chain all lcrecords together, and a special ID field used to distinguish one lcrecord from another. (This field is used only for debugging and could be removed, but the space gain is not significant.) Simple lrecords are created using `ALLOCATE_FIXED_TYPE()', just like for other frob blocks. The only change is that the implementation pointer must be initialized correctly. (The implementation structure for an lrecord, or rather the pointer to it, is named `lrecord_float', `lrecord_extent', `lrecord_buffer', etc.) lcrecords are created using `alloc_lcrecord()'. This takes a size to allocate and an implementation pointer. (The size needs to be passed because some lcrecords, such as window configurations, are of variable size.) This basically just `malloc()'s the storage, initializes the `struct lcrecord_header', and chains the lcrecord onto the head of the list of all lcrecords, which is stored in the variable `all_lcrecords'. The calls to `alloc_lcrecord()' generally occur in the lowest-level allocation function for each lrecord type. Whenever you create an lrecord, you need to call either `DEFINE_LRECORD_IMPLEMENTATION()' or `DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()'. This needs to be specified in a C file, at the top level. What this actually does is define and initialize the implementation structure for the lrecord. (And possibly declares a function `error_check_foo()' that implements the `XFOO()' macro when error-checking is enabled.) The arguments to the macros are the actual type name (this is used to construct the C variable name of the lrecord implementation structure and related structures using the `##' macro concatenation operator), a string that names the type on the Lisp level (this may not be the same as the C type name; typically, the C type name has underscores, while the Lisp string has dashes), various method pointers, and the name of the C structure that contains the object. The methods are used to encapsulate type-specific information about the object, such as how to print it or mark it for garbage collection, so that it's easy to add new object types without having to add a specific case for each new type in a bunch of different places. The difference between `DEFINE_LRECORD_IMPLEMENTATION()' and `DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION()' is that the former is used for fixed-size object types and the latter is for variable-size object types. Most object types are fixed-size; some complex types, however (e.g. window configurations), are variable-size. Variable-size object types have an extra method, which is called to determine the actual size of a particular object of that type. (Currently this is only used for keeping allocation statistics.) For the purpose of keeping allocation statistics, the allocation engine keeps a list of all the different types that exist. Note that, since `DEFINE_LRECORD_IMPLEMENTATION()' is a macro that is specified at top-level, there is no way for it to add to the list of all existing types. What happens instead is that each implementation structure contains in it a dynamically assigned number that is particular to that type. (Or rather, it contains a pointer to another structure that contains this number. This evasiveness is done so that the implementation structure can be declared const.) In the sweep stage of garbage collection, each lrecord is examined to see if its implementation structure has its dynamically-assigned number set. If not, it must be a new type, and it is added to the list of known types and a new number assigned. The number is used to index into an array holding the number of objects of each type and the total memory allocated for objects of that type. The statistics in this array are also computed during the sweep stage. These statistics are returned by the call to `garbage-collect' and are printed out at the end of the loadup phase. Note that for every type defined with a `DEFINE_LRECORD_*()' macro, there needs to be a `DECLARE_LRECORD_IMPLEMENTATION()' somewhere in a `.h' file, and this `.h' file needs to be included by `inline.c'. Furthermore, there should generally be a set of `XFOOBAR()', `FOOBARP()', etc. macros in a `.h' (or occasionally `.c') file. To create one of these, copy an existing model and modify as necessary. The various methods in the lrecord implementation structure are: 1. A "mark" method. This is called during the marking stage and passed a function pointer (usually the `mark_object()' function), which is used to mark an object. All Lisp objects that are contained within the object need to be marked by applying this function to them. The mark method should also return a Lisp object, which should be either nil or an object to mark. (This can be used in lieu of calling `mark_object()' on the object, to reduce the recursion depth, and consequently should be the most heavily nested sub-object, such as a long list.) *Note*: When the mark method is called, garbage collection is in progress, and special precautions need to be taken when accessing objects; see section (B) above. If your mark method does not need to do anything, it can be `NULL'. 2. A "print" method. This is called to create a printed representation of the object, whenever `princ', `prin1', or the like is called. It is passed the object, a stream to which the output is to be directed, and an `escapeflag' which indicates whether the object's printed representation should be "escaped" so that it is readable. (This corresponds to the difference between `princ' and `prin1'.) Basically, "escaped" means that strings will have quotes around them and confusing characters in the strings such as quotes, backslashes, and newlines will be backslashed; and that special care will be taken to make symbols print in a readable fashion (e.g. symbols that look like numbers will be backslashed). Other readable objects should perhaps pass `escapeflag' on when sub-objects are printed, so that readability is preserved when necessary (or if not, always pass in a 1 for `escapeflag'). Non-readable objects should in general ignore `escapeflag', except that some use it as an indication that more verbose output should be given. Sub-objects are printed using `print_internal()', which takes exactly the same arguments as are passed to the print method. Literal C strings should be printed using `write_c_string()', or `write_string_1()' for non-null-terminated strings. Functions that do not have a readable representation should check the `print_readably' flag and signal an error if it is set. If you specify NULL for the print method, the `default_object_printer()' will be used. 3. A "finalize" method. This is called at the beginning of the sweep stage on lcrecords that are about to be freed, and should be used to perform any extra object cleanup. This typically involves freeing any extra `malloc()'ed memory associated with the object, releasing any operating-system and window-system resources associated with the object (e.g. pixmaps, fonts), etc. The finalize method can be NULL if nothing needs to be done. WARNING #1: The finalize method is also called at the end of the dump phase; this time with the for_disksave parameter set to non-zero. The object is *not* about to disappear, so you have to make sure to *not* free any extra `malloc()'ed memory if you're going to need it later. (Also, signal an error if there are any operating-system and window-system resources here, because they can't be dumped.) Finalize methods should, as a rule, set to zero any pointers after they've been freed, and check to make sure pointers are not zero before freeing. Although I'm pretty sure that finalize methods are not called twice on the same object (except for the `for_disksave' proviso), we've gotten nastily burned in some cases by not doing this. WARNING #2: The finalize method is *only* called for lcrecords, *not* for simply lrecords. If you need a finalize method for simple lrecords, you have to stick it in the `ADDITIONAL_FREE_foo()' macro in `alloc.c'. WARNING #3: Things are in an *extremely* bizarre state when `ADDITIONAL_FREE_foo()' is called, so you have to be incredibly careful when writing one of these functions. See the comment in `gc_sweep()'. If you ever have to add one of these, consider using an lcrecord or dealing with the problem in a different fashion. 4. An "equal" method. This compares the two objects for similarity, when `equal' is called. It should compare the contents of the objects in some reasonable fashion. It is passed the two objects and a "depth" value, which is used to catch circular objects. To compare sub-Lisp-objects, call `internal_equal()' and bump the depth value by one. If this value gets too high, a `circular-object' error will be signaled. If this is NULL, objects are `equal' only when they are `eq', i.e. identical. 5. A "hash" method. This is used to hash objects when they are to be compared with `equal'. The rule here is that if two objects are `equal', they *must* hash to the same value; i.e. your hash function should use some subset of the sub-fields of the object that are compared in the "equal" method. If you specify this method as `NULL', the object's pointer will be used as the hash, which will *fail* if the object has an `equal' method, so don't do this. To hash a sub-Lisp-object, call `internal_hash()'. Bump the depth by one, just like in the "equal" method. To convert a Lisp object directly into a hash value (using its pointer), use `LISP_HASH()'. This is what happens when the hash method is NULL. To hash two or more values together into a single value, use `HASH2()', `HASH3()', `HASH4()', etc. 6. "getprop", "putprop", "remprop", and "plist" methods. These are used for object types that have properties. I don't feel like documenting them here. If you create one of these objects, you have to use different macros to define them, i.e. `DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS()' or `DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS()'. 7. A "size_in_bytes" method, when the object is of variable-size. (i.e. declared with a `_SEQUENCE_IMPLEMENTATION' macro.) This should simply return the object's size in bytes, exactly as you might expect. For an example, see the methods for window configurations and opaques. File: internals.info, Node: Low-level allocation, Next: Pure Space, Prev: lrecords, Up: Allocation of Objects in XEmacs Lisp Low-level allocation ==================== Memory that you want to allocate directly should be allocated using `xmalloc()' rather than `malloc()'. This implements error-checking on the return value, and once upon a time did some more vital stuff (i.e. `BLOCK_INPUT', which is no longer necessary). Free using `xfree()', and realloc using `xrealloc()'. Note that `xmalloc()' will do a non-local exit if the memory can't be allocated. (Many functions, however, do not expect this, and thus XEmacs will likely crash if this happens. *This is a bug.* If you can, you should strive to make your function handle this OK. However, it's difficult in the general circumstance, perhaps requiring extra unwind-protects and such.) Note that XEmacs provides two separate replacements for the standard `malloc()' library function. These are called "old GNU malloc" (`malloc.c') and "new GNU malloc" (`gmalloc.c'), respectively. New GNU malloc is better in pretty much every way than old GNU malloc, and should be used if possible. (It used to be that on some systems, the old one worked but the new one didn't. I think this was due specifically to a bug in SunOS, which the new one now works around; so I don't think the old one ever has to be used any more.) The primary difference between both of these mallocs and the standard system malloc is that they are much faster, at the expense of increased space. The basic idea is that memory is allocated in fixed chunks of powers of two. This allows for basically constant malloc time, since the various chunks can just be kept on a number of free lists. (The standard system malloc typically allocates arbitrary-sized chunks and has to spend some time, sometimes a significant amount of time, walking the heap looking for a free block to use and cleaning things up.) The new GNU malloc improves on things by allocating large objects in chunks of 4096 bytes rather than in ever larger powers of two, which results in ever larger wastage. There is a slight speed loss here, but it's of doubtful significance. NOTE: Apparently there is a third-generation GNU malloc that is significantly better than the new GNU malloc, and should probably be included in XEmacs. There is also the relocating allocator, `ralloc.c'. This actually moves blocks of memory around so that the `sbrk()' pointer shrunk and virtual memory released back to the system. On some systems, this is a big win. On all systems, it causes a noticeable (and sometimes huge) speed penalty, so I turn it off by default. `ralloc.c' only works with the new GNU malloc in `gmalloc.c'. There are also two versions of `ralloc.c', one that uses `mmap()' rather than block copies to move data around. This purports to be faster, although that depends on the amount of data that would have had to be block copied and the system-call overhead for `mmap()'. I don't know exactly how this works, except that the relocating-allocation routines are pretty much used only for the memory allocated for a buffer, which is the biggest consumer of space, esp. of space that may get freed later. Note that the GNU mallocs have some "memory warning" facilities. XEmacs taps into them and issues a warning through the standard warning system, when memory gets to 75%, 85%, and 95% full. (On some systems, the memory warnings are not functional.) Allocated memory that is going to be used to make a Lisp object is created using `allocate_lisp_storage()'. This calls `xmalloc()' but also verifies that the pointer to the memory can fit into a Lisp word (remember that some bits are taken away for a type tag and a mark bit). If not, an error is issued through `memory_full()'. `allocate_lisp_storage()' is called by `alloc_lcrecord()', `ALLOCATE_FIXED_TYPE()', and the vector and bit-vector creation routines. These routines also call `INCREMENT_CONS_COUNTER()' at the appropriate times; this keeps statistics on how much memory is allocated, so that garbage-collection can be invoked when the threshold is reached. File: internals.info, Node: Pure Space, Next: Cons, Prev: Low-level allocation, Up: Allocation of Objects in XEmacs Lisp Pure Space ========== Not yet documented. File: internals.info, Node: Cons, Next: Vector, Prev: Pure Space, Up: Allocation of Objects in XEmacs Lisp Cons ==== Conses are allocated in standard frob blocks. The only thing to note is that conses can be explicitly freed using `free_cons()' and associated functions `free_list()' and `free_alist()'. This immediately puts the conses onto the cons free list, and decrements the statistics on memory allocation appropriately. This is used to good effect by some extremely commonly-used code, to avoid generating extra objects and thereby triggering GC sooner. However, you have to be *extremely* careful when doing this. If you mess this up, you will get BADLY BURNED, and it has happened before. File: internals.info, Node: Vector, Next: Bit Vector, Prev: Cons, Up: Allocation of Objects in XEmacs Lisp Vector ====== As mentioned above, each vector is `malloc()'ed individually, and all are threaded through the variable `all_vectors'. Vectors are marked strangely during garbage collection, by kludging the size field. Note that the `struct Lisp_Vector' is declared with its `contents' field being a *stretchy* array of one element. It is actually `malloc()'ed with the right size, however, and access to any element through the `contents' array works fine. File: internals.info, Node: Bit Vector, Next: Symbol, Prev: Vector, Up: Allocation of Objects in XEmacs Lisp Bit Vector ========== Bit vectors work exactly like vectors, except for more complicated code to access an individual bit, and except for the fact that bit vectors are lrecords while vectors are not. (The only difference here is that there's an lrecord implementation pointer at the beginning and the tag field in bit vector Lisp words is "lrecord" rather than "vector".) File: internals.info, Node: Symbol, Next: Marker, Prev: Bit Vector, Up: Allocation of Objects in XEmacs Lisp Symbol ====== Symbols are also allocated in frob blocks. Note that the code exists for symbols to be either lrecords (category (c) above) or simple types (category (b) above), and are lrecords by default (I think), although there is no good reason for this. Note that symbols in the awful horrible obarray structure are chained through their `next' field. Remember that `intern' looks up a symbol in an obarray, creating one if necessary. File: internals.info, Node: Marker, Next: String, Prev: Symbol, Up: Allocation of Objects in XEmacs Lisp Marker ====== Markers are allocated in frob blocks, as usual. They are kept in a buffer unordered, but in a doubly-linked list so that they can easily be removed. (Formerly this was a singly-linked list, but in some cases garbage collection took an extraordinarily long time due to the O(N^2) time required to remove lots of markers from a buffer.) Markers are removed from a buffer in the finalize stage, in `ADDITIONAL_FREE_marker()'. File: internals.info, Node: String, Next: Bytecode, Prev: Marker, Up: Allocation of Objects in XEmacs Lisp String ====== As mentioned above, strings are a special case. A string is logically two parts, a fixed-size object (containing the length, property list, and a pointer to the actual data), and the actual data in the string. The fixed-size object is a `struct Lisp_String' and is allocated in frob blocks, as usual. The actual data is stored in special "string-chars blocks", which are 8K blocks of memory. Currently-allocated strings are simply laid end to end in these string-chars blocks, with a pointer back to the `struct Lisp_String' stored before each string in the string-chars block. When a new string needs to be allocated, the remaining space at the end of the last string-chars block is used if there's enough, and a new string-chars block is created otherwise. There are never any holes in the string-chars blocks due to the string compaction and relocation that happens at the end of garbage collection. During the sweep stage of garbage collection, when objects are reclaimed, the garbage collector goes through all string-chars blocks, looking for unused strings. Each chunk of string data is preceded by a pointer to the corresponding `struct Lisp_String', which indicates both whether the string is used and how big the string is, i.e. how to get to the next chunk of string data. Holes are compressed by block-copying the next string into the empty space and relocating the pointer stored in the corresponding `struct Lisp_String'. *This means you have to be careful with strings in your code.* See the section above on `GCPRO'ing. Note that there is one situation not handled: a string that is too big to fit into a string-chars block. Such strings, called "big strings", are all `malloc()'ed as their own block. (#### Although it would make more sense for the threshold for big strings to be somewhat lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that this was indeed the case formerly - indeed, the threshold was set at 1/8 - but Mly forgot about this when rewriting things for 19.8.) Note also that the string data in string-chars blocks is padded as necessary so that proper alignment constraints on the `struct Lisp_String' back pointers are maintained. Finally, strings can be resized. This happens in Mule when a character is substituted with a different-length character, or during modeline frobbing. (You could also export this to Lisp, but it's not done so currently.) Resizing a string is a potentially tricky process. If the change is small enough that the padding can absorb it, nothing other than a simple memory move needs to be done. Keep in mind, however, that the string can't shrink too much because the offset to the next string in the string-chars block is computed by looking at the length and rounding to the nearest multiple of four or eight. If the string would shrink or expand beyond the correct padding, new string data needs to be allocated at the end of the last string-chars block and the data moved appropriately. This leaves some dead string data, which is marked by putting a special marker of 0xFFFFFFFF in the `struct Lisp_String' pointer before the data (there's no real `struct Lisp_String' to point to and relocate), and storing the size of the dead string data (which would normally be obtained from the now-non-existent `struct Lisp_String') at the beginning of the dead string data gap. The string compactor recognizes this special 0xFFFFFFFF marker and handles it correctly. File: internals.info, Node: Bytecode, Prev: String, Up: Allocation of Objects in XEmacs Lisp Bytecode ======== Not yet documented. File: internals.info, Node: Events and the Event Loop, Next: Evaluation; Stack Frames; Bindings, Prev: Allocation of Objects in XEmacs Lisp, Up: Top Events and the Event Loop ************************* * Menu: * Introduction to Events:: * Main Loop:: * Specifics of the Event Gathering Mechanism:: * Specifics About the Emacs Event:: * The Event Stream Callback Routines:: * Other Event Loop Functions:: * Converting Events:: * Dispatching Events; The Command Builder:: File: internals.info, Node: Introduction to Events, Next: Main Loop, Up: Events and the Event Loop Introduction to Events ====================== An event is an object that encapsulates information about an interesting occurrence in the operating system. Events are generated either by user action, direct (e.g. typing on the keyboard or moving the mouse) or indirect (moving another window, thereby generating an expose event on an Emacs frame), or as a result of some other typically asynchronous action happening, such as output from a subprocess being ready or a timer expiring. Events come into the system in an asynchronous fashion (typically through a callback being called) and are converted into a synchronous event queue (first-in, first-out) in a process that we will call "collection". Note that each application has its own event queue. (It is immaterial whether the collection process directly puts the events in the proper application's queue, or puts them into a single system queue, which is later split up.) The most basic level of event collection is done by the operating system or window system. Typically, XEmacs does its own event collection as well. Often there are multiple layers of collection in XEmacs, with events from various sources being collected into a queue, which is then combined with other sources to go into another queue (i.e. a second level of collection), with perhaps another level on top of this, etc. XEmacs has its own types of events (called "Emacs events"), which provides an abstract layer on top of the system-dependent nature of the most basic events that are received. Part of the complex nature of the XEmacs event collection process involves converting from the operating-system events into the proper Emacs events - there may not be a one-to-one correspondence. Emacs events are documented in `events.h'; I'll discuss them later.